Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmark
نویسندگان
چکیده
BAliBASE is one of the most widely used benchmarks for multiple sequence alignment programs. The accuracy of alignment methods is measured by bali score—an application provided together with the database. The standard accuracy measures are the Sum of Pairs (SP) and the Total Column (TC). We have found that, for non-core block columns, results calculated by bali score are different from those obtained on the basis of the formal definitions of the measures. We do not claim that one of these measures is better than the other, but they are definitely different. Such a situation can be the source of confusion when alignments obtained using various methods are compared. Therefore, we propose a new nomenclature for the measures of the quality of multiple sequence alignments to distinguish which one was actually calculated. Moreover, we have found that the occurrence of a gap in some column in the first sequence of the reference alignment causes column discarding.
منابع مشابه
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark.
Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information av...
متن کاملEvaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set
MOTIVATION SAM-T99 is an iterative hidden Markov model-based method for finding proteins similar to a single target sequence and aligning them. One of its main uses is to produce multiple alignments of homologs of the target sequence. Previous tests of SAM-T99 and its predecessors have concentrated on the quality of the searches performed, not on the quality of the multiple alignment. In this p...
متن کاملProbCons: Probabilistic consistency-based multiple sequence alignment.
To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce probabilistic consist...
متن کاملBAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs
SUMMARY BAliBASE is a database of manually refined multiple sequence alignments categorized by core blocks of conservation sequence length, similarity, and the presence of insertions and N/C-terminal extensions. AVAILABILITY From http://www-igbmc. u-strasbg.fr/BioInfo/BAliBASE/index.html
متن کاملBAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations
BAliBASE is specifically designed to serve as an evaluation resource to address all the problems encountered when aligning complete sequences. The database contains high quality, manually constructed multiple sequence alignments together with detailed annotations. The alignments are all based on three-dimensional structural superpositions, with the exception of the transmembrane sequences. The ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Applied Mathematics and Computer Science
دوره 19 شماره
صفحات -
تاریخ انتشار 2009